Bridging Data and Decision-Making: Data Visualization Techniques with R

IEEE Nigeria Southeast Subsection

Ifeoma Egbogah

Drowning in Data, Starving for Insight

Story: “Too Many Reports, Not Enough Direction”

Let me tell you about Amina.

She worked in customer retention at a mid-sized logistics company. Every Monday, her inbox was flooded with CSV files—customer complaints, delivery delays, package weights, region-wise returns…

All of it collected diligently by the operations team.

But something was wrong.

Despite the data, customer churn kept rising. Leadership was frustrated. Amina felt helpless.

Until one day, she decided to stop sending spreadsheets and start telling stories with the data.

XYZ Logistics Customer Data
Month Region Customer Complaints Average Delivery Delay Days Returns Average Package Weight Kg
Jan-2024 North 119 19 27 5.19
Feb-2024 North 129 23 42 2.76
Mar-2024 North 114 8 59 5.69
Apr-2024 North 112 6 43 6.25
May-2024 North 128 6 40 5.28

XYZ Logistics

XYZ Logistics

Data

What is Data?

Data refers to raw facts, figures, and statistics that are collected through observation, measurement, research, or experimentation. On their own, data have no meaning until they are organized, analyzed, and interpreted.

Key Characteristics of Data:

  • Raw: Unprocessed and unorganized

  • Factual: Based on real-world events, measurements, or records

Data Types

Numerical or Quantitative Data

Numerical (or Quantitative) Data refers to data that represents measurable quantities—that is, values that can be counted or measured and expressed in numbers.

Data Types Contd.

Numerical or Quantitative Data

Continuous Data Discrete Data
Data that can take any value within a range. Data that can take only specific, separate values.
Usually measured (can include decimals/fractions). Usually countable (no decimals)

Examples:

  • Height of a person (e.g., 1.75 meters)

  • Temperature (e.g., 36.6°C)

  • Sales revenue (e.g., ₦1,254,500.75)

Examples:

  • Number of employees in a company (e.g., 15, 23, 50)

  • Number of students in a classroom

  • Number of cars sold in a day

Data Types Contd.

Key Features of Numerical Data:

  • Can be compared, ordered, added, or averaged

  • Suitable for mathematical and statistical analysis

  • Often visualized using bar charts, histograms, line graphs, or scatter plots

Data Types Contd.

Categorical or Qualitative Data

Categorical (or Qualitative) Data refers to data that describes qualities or characteristics. Instead of numbers, it uses labels, names, or categories to represent information.

Data Types Contd.

Key Feature of Categorical Data:

  • Descriptive rather than numerical

  • Used to classify or group data

  • Cannot be meaningfully added, subtracted, or averaged

  • Can be visualized using bar charts, pie charts, or tables

Why Data Visualization Matters

  • Humans process visuals 60,000x faster than text

  • Visuals simplify complex data

  • Helps identify trends, outliers, and patterns

  • Supports data-driven decisions

Choosing the appropriate graph(s) for the data

So before any visualisation always consider:

  • Discrete & continuous quantities
  • Categeories

Understanding Visualization Types

Visualising numerical variables

Sometimes you are interested in the magnitude of some set of numbers.

PhDs Awarded by Field

The US gov collects data on all doctoral degree graduates every year. The data comes from the NSF.

phd_total <-   phd_field |>
    group_by(broad_field) |> 
    summarise(phd_total = sum(n_phds, na.rm = TRUE)) |>
    mutate(broad_field = fct_reorder(broad_field, phd_total)) |> 
    ggplot(aes(phd_total, broad_field)) +
    geom_col() +
    scale_x_continuous(labels = scales::comma_format()) +
    labs(title = "PhD Awarded from 2008 to 2017 in the USA",
       x = "Number of PhD",
       y = "Field/Faulty",
       caption = "Data: NSF • Viz: Ifeoma Egbogah")

depart_total <- phd_field |> 
  group_by(broad_field, major_field) |> 
  summarise(phd_total = sum(n_phds, na.rm = TRUE), .groups = "drop") |>
  mutate(major_field = fct_reorder(major_field, phd_total)) |> 
  ggplot(aes(phd_total, major_field)) +
  geom_col() +
  scale_x_continuous(labels = scales::comma_format()) +
  labs(x = "Number of PhD",
       y = "Department",
       title = "Total Number of PhDs Awarded by Departments from 2008 to 2017",
        caption = "Data: NSF • Viz: Ifeoma Egbogah")

Visualising Distributions

We frequently encounter situations where we want to understand how a particular variable is distributed within a dataset. For example, in a sales dataset, we might be interested in examining the distribution of monthly revenue, the number of units sold per product, or the average transaction value. Understanding these distributions helps us identify patterns, spot outliers, and make data-driven decisions for improving business performance.

Penguins Distribution

penguins |> 
  ggplot(aes(bill_length_mm)) +
  geom_histogram() +
  labs(title = "Distribution of the Bills Of Penguins",
       y = "Count",
       x = "Bill Length (mm)",
       caption = "Data: Palmer Penguins • Viz: Ifeoma Egbogah")

Chart Type Best For
Line Chart Trends over time
Bar Chart Comparing categories
Scatter Plot Correlations, relationships
Maps Geospatial data
Dashboard Monitoring KPIs in real-time

Tip: Choose simplicity and clarity over complexity.

Real World Scenario: Walmart and Weather

Walmart

Context: We can’t control the weather, yet it affects everything we do. Most days are calm, but when Mother Nature shows her power, normal life is turned upside down.

In these moments, how can businesses keep operating and support communities?

In 2004 as Hurricane Frances roared through the Caribbean toward Florida, Walmart’s team asked themselves — how can we prepare smarter? Which products will people desperately need?

Walmart used historical sales data to anticipate what people would need most.

Insight: While obvious necessities like bottled water, flashlights, and batteries were expected, the data revealed a surprising pattern: strawberry Pop‑Tarts sales increased sevenfold, and beer became the top-selling item

Action Taken: Walmart stocked these items in bulk at regional distribution centers in the hurricane path.

Walmart: Summary statistics of active inventory performance

item_stat <- wal_train |> 
  filter(units > 0 & units <2000) |>  
  mutate(store = glue::glue("Store {store_nbr}")) |> 
  group_by(store) |> 
  ggplot(aes(units, factor(item_nbr))) +
  geom_boxplot(outlier.shape = 16, outlier.size = 1.5, fill = "#f8f8f8", color = "#88398A") +
  coord_flip() +
  guides(x = guide_axis(angle = 90)) +
  labs(title = "Distribution of Items Sold",
       y = "Item Number",
       x = "Units Sold",
       caption = "Data: Kaggle • Viz: Ifeoma Egbogah")
store <- wal_train |> 
  filter(units > 0) |>  
  filter(store_nbr %in% seq(1, 5, 1)) |> 
  mutate(store = glue::glue("Store {store_nbr}")) |> 
  group_by(store) |> 
  ggplot(aes(units, factor(item_nbr))) +
  geom_boxplot(outlier.shape = 16, outlier.size = 1.5, fill = "#f8f8f8", color = "#88398A") +
  facet_wrap(~store,  scales = "free", nrow = 1) +
  labs(title = "Distribution of Units Sold by Item across Stores 1 to 5",
       y = "Item Number",
       x = "Units Sold",
       caption = "Data: Kaggle • Viz: Ifeoma Egbogah") +
  theme_minimal() + 
  theme(plot.title = element_text(colour = "#562457", face = "bold", hjust = 0.5),
        strip.text = element_text(face = "bold"), 
        axis.text.y = element_text(size = 6))
store_by_year <- wal_train |> 
  mutate(year = year(date)) |>
  filter(units > 0) |> 
  filter(store_nbr %in% seq(1, 5, 1)) |> 
  group_by(store_nbr, year) |> 
  ggplot(aes(units, factor(item_nbr))) +
 geom_boxplot(outlier.shape = 16, outlier.size = 1.5, fill = "#f8f8f8", color = "#88398A") +
  facet_grid(rows = vars(year), cols = vars(store_nbr), scales = "free_y") +
  labs(title = "Distribution of Units Sold by Item across Stores 1 to 5",
       y = "Item Number",
       x = "Units Sold",
       caption = "Data: Kaggle • Viz: Ifeoma Egbogah") +
  #theme_minimal() + 
  theme(plot.title = element_text(colour = "#562457", face = "bold", hjust = 0.5),
        strip.text = element_text(face = "bold"), 
        axis.text.y = element_text(size = 6))

Real World Scenario: Nigerian Brewries

<img src=“images/nig_brew_logo.png” width=“60%”, height=“70%”>

COVID-19

COVID-19 dashboards helped governments track and respond to infection spikes and vaccinations.

Bridging the Gap Between Data and Decisions

Mind the Gap

Problem: Data is abundant, but insights are scarce.

Solution: Visualization bridges the gap between raw data and strategic action.

Outcome: Simplifies storytelling and supports real-time decisions.

What is R and Why Use It?

R

  • Free and open-source statistical language

  • Used in academia and business

  • Integrates data wrangling, analysis, and visualization

Key Visualization Packages:

ggplot2

plotly

shiny

Data Visualization

Case Study

Academic Use Case – Education Access

Dataset: World Bank (Literacy vs Internet Access)

Visualization: Scatter plot showing socio-economic development.

Insight: Nigeria lags behind Kenya and Egypt in internet penetration despite comparable literacy rates.